Joey Hess [Mon, 3 Mar 2025 15:59:04 +0000 (11:59 -0400)]
support git files as input to computations
Using GIT keys, like are used when exporting git files to special
remotes. Except here the GIT key refers to a file checked into the git
repo.
Note that, since the compute remote uses catObject to get the content,
a symlink that is checked into git does not get followed. This is important
for security, because following a symlink and adding the content to the
repo as an annex object would allow exfiltrating content from outside
the repository.
Instead, the behavior with a symlink is to run the computation on the
symlink target. This may turn out to be confusing, and it might be worth
addcomputed checking if the file in git is a symlink and erroring out.
Or it could follow symlinks as long as the destination is a file in the
repisitory.
Joey Hess [Mon, 3 Mar 2025 15:08:36 +0000 (11:08 -0400)]
factor out Annex.GitShaKey
Joey Hess [Mon, 3 Mar 2025 14:57:56 +0000 (10:57 -0400)]
record VURL key hashes in addcomputed and recompute
Joey Hess [Thu, 27 Feb 2025 20:19:41 +0000 (16:19 -0400)]
record VURL key hashes when getting from compute remote
Like when getting from the web special remote, when the output of the
computation has changed, record the new hash of the content as an
equivilant key for the VURL key.
Still needs to be done for addcomputed and recompute.
Joey Hess [Thu, 27 Feb 2025 20:18:04 +0000 (16:18 -0400)]
fix build
Joey Hess [Thu, 27 Feb 2025 20:17:42 +0000 (16:17 -0400)]
refactor
Joey Hess [Thu, 27 Feb 2025 19:12:29 +0000 (15:12 -0400)]
many recompute improvements
I've lost track of them all, but it includes:
* Using the same key backend as was used in the original computation.
* Fixing bug that prevented updating the source file key in the compute
state
* Handling --reproducible and --unreproducible.
* recompute --original of a file using VURL, when the result is
different, but the key remains the same, makes the object file
be updated with the new content
* Detecting some other ways the program behavior can change, just for
completeness.
* Also adds --backend to addcomputed.
Joey Hess [Thu, 27 Feb 2025 18:54:03 +0000 (14:54 -0400)]
refactoring
Joey Hess [Thu, 27 Feb 2025 15:10:44 +0000 (11:10 -0400)]
fix recompute of renamed files
When a computed file has been renamed, a recompute needs to write to the
new filename.
I decided to remove --others because it's not clear what it should do in
the face of renames. Should it update only other files that have not
been renamed? Or update files that use the old key to the new key
anywhere in the tree? Or write the other files to the cwd, ignoring
renames? Since --others is just a way to save on compute time, adding
this complexity at this point seems like a bad idea. May revisit later.
Added temporary TODO-compute file
Joey Hess [Wed, 26 Feb 2025 19:59:47 +0000 (15:59 -0400)]
todo
Joey Hess [Wed, 26 Feb 2025 19:51:31 +0000 (15:51 -0400)]
recompute closer to working properly
Proper behavior without --others implemented.
And eliminated most of the code duplication through refactoring.
Also, changed it to not stage recomputed files. This way, git diff will
show files that have differences.
Joey Hess [Wed, 26 Feb 2025 18:05:37 +0000 (14:05 -0400)]
refactor
Joey Hess [Wed, 26 Feb 2025 15:25:32 +0000 (11:25 -0400)]
started git-annex recompute
The perform action of this still needs work to do the right thing.
In particular, it currently behaves as if --others was always set.
And, it duplicates a lot of code from addcomputed.
Joey Hess [Wed, 26 Feb 2025 13:47:56 +0000 (09:47 -0400)]
showOutput
when the compute program eg displays usage, it needs to start on its own
line
Joey Hess [Wed, 26 Feb 2025 13:45:35 +0000 (09:45 -0400)]
addcomputed inherits extra initremote parameters
This is limited because the remote config is a field/value map. So order
is not preserved, and when 2 parameters have the same field name, only
the last one will be passed.
Joey Hess [Tue, 25 Feb 2025 22:45:55 +0000 (18:45 -0400)]
todo
Joey Hess [Tue, 25 Feb 2025 22:44:40 +0000 (18:44 -0400)]
add compute remote uuid to compute state url
Otherwise, two different compute remotes that happen to take the same
input would use the same compute state url. Which seems wrong.
Joey Hess [Tue, 25 Feb 2025 21:26:28 +0000 (17:26 -0400)]
wording
Joey Hess [Tue, 25 Feb 2025 21:23:38 +0000 (17:23 -0400)]
pdate demo program
needed a mkdir
Joey Hess [Tue, 25 Feb 2025 21:10:41 +0000 (17:10 -0400)]
use compute program REPRODUCIBLE by default
Joey Hess [Tue, 25 Feb 2025 21:00:00 +0000 (17:00 -0400)]
ingest when --unreproducible is used without --fast
Joey Hess [Tue, 25 Feb 2025 20:36:22 +0000 (16:36 -0400)]
addcomputed --fast and --unreproducible working
For these, use VURL and URL keys, with an "annex-compute:" URI prefix.
These URL keys will look something like this:
URL--annex-compute&cbar4,63pconvert,3-
f4d3d72cf3f16ac9c3e9a8012bde4462
Generally it's too long so most of it gets md5summed. It's a little
ugly, but it's what fell out of the existing URL key generation
machinery. I did consider special casing to eg
"URL--annex-compute&
c4d3d72cf3f16ac9c3e9a8012bde4462". But it seems at
least possibly useful that the name of the file that was computed is
visible and perhaps one or two words of the git-annex compute command
parameters.
Note that two different output files from the same computation will get
the same URL key. And these keys should remain stable.
Joey Hess [Tue, 25 Feb 2025 19:45:14 +0000 (15:45 -0400)]
add git-annex addcomputed
Working pretty well. Mostly. But:
* Does not yet support inputs that are non-annexed files checked into git
* --fast is currently broken (will need something like VURL keys)
* --unreproducible still uses a checksumming backend, so drop and get
again will likely fail (needs probably to use an URL key or something
like one)
The compute special remote seems to work pretty well too. Eg,
getting from it works, and dropping content that is present in it works.
Joey Hess [Tue, 25 Feb 2025 19:08:38 +0000 (15:08 -0400)]
handle comutations in subdirs of the git repository
Eg, a computation might be run in "foo/" and refer to "../bar" as an
input or output.
So, the subdir is part of the computation state.
Also, prevent input or output of files that are outside the git
repository. Of course, the program can access any file on disk if it
wants to; this is just a guard against mistakes. And it may also be
useful if the program comunicates with something less trusted than it,
eg a container image, so input/output files communicated by that are not
the source of security problems.
Joey Hess [Mon, 24 Feb 2025 20:39:55 +0000 (16:39 -0400)]
add field desc
Joey Hess [Mon, 24 Feb 2025 20:15:04 +0000 (16:15 -0400)]
update for new interface
Joey Hess [Mon, 24 Feb 2025 19:48:42 +0000 (15:48 -0400)]
reimplement using new compute program interface
Joey Hess [Mon, 24 Feb 2025 17:48:46 +0000 (13:48 -0400)]
support addcomputed --fast
This complicates the interface but it's still simpler to understand than
the old interface.
Joey Hess [Mon, 24 Feb 2025 16:41:25 +0000 (12:41 -0400)]
new compute program interface
This is much more flexible, and also simpler to understand.
Joey Hess [Fri, 21 Feb 2025 19:09:46 +0000 (15:09 -0400)]
update
Joey Hess [Fri, 21 Feb 2025 19:02:53 +0000 (15:02 -0400)]
compute special remote mostly implemented
Except for some of the hard parts: progress displays, incremental
verification, and getting inputs before running a computation.
Untested! In order to test this, git-annex addcomputed needs to be
implemented.
Joey Hess [Fri, 21 Feb 2025 18:51:02 +0000 (14:51 -0400)]
remove unused adjustedBranchRefresh associated file parameter
Joey Hess [Thu, 20 Feb 2025 17:29:05 +0000 (13:29 -0400)]
wip
Joey Hess [Thu, 20 Feb 2025 17:27:59 +0000 (13:27 -0400)]
update
Joey Hess [Thu, 20 Feb 2025 17:27:47 +0000 (13:27 -0400)]
wip
Joey Hess [Wed, 19 Feb 2025 20:03:34 +0000 (16:03 -0400)]
update
Joey Hess [Wed, 19 Feb 2025 19:14:52 +0000 (15:14 -0400)]
comments
Joey Hess [Wed, 19 Feb 2025 18:29:18 +0000 (14:29 -0400)]
documentation for compute remote and associated commands
None of this is implemented yet.
Joey Hess [Wed, 19 Feb 2025 18:16:36 +0000 (14:16 -0400)]
add REPRODUCIBLE
Joey Hess [Wed, 19 Feb 2025 16:32:35 +0000 (12:32 -0400)]
optional and required inputs and some other changes
Joey Hess [Tue, 18 Feb 2025 19:46:47 +0000 (15:46 -0400)]
improved draft design
Joey Hess [Tue, 18 Feb 2025 18:46:10 +0000 (14:46 -0400)]
improve apiurl description
Joey Hess [Tue, 18 Feb 2025 18:11:11 +0000 (14:11 -0400)]
git-lfs apiurl parameter
git-lfs: Added an optional apiurl parameter.
This needs version 1.2.5 of the haskell git-lfs library to be used.
stack.yaml updated to use that.
Note that git-annex enableremote can be used to add apiurl= to an existing
git-lfs special remote. To allow unsetting the apiurl and instead use
the probed url, support enableremote with apiurl set to an empty string.
Sponsored-by: Luke T. Shumaker
sharad [Mon, 17 Feb 2025 19:30:28 +0000 (19:30 +0000)]
Added a comment: Faced same issue for long time
Joey Hess [Mon, 17 Feb 2025 18:56:56 +0000 (14:56 -0400)]
OsPath build fix
Joey Hess [Mon, 17 Feb 2025 18:46:43 +0000 (14:46 -0400)]
OsPath build fix
Joey Hess [Mon, 17 Feb 2025 18:06:06 +0000 (14:06 -0400)]
OSX build fix
Joey Hess [Mon, 17 Feb 2025 18:05:19 +0000 (14:05 -0400)]
OSX build fixes
Joey Hess [Mon, 17 Feb 2025 18:04:08 +0000 (14:04 -0400)]
OSX build fixes
Joey Hess [Mon, 17 Feb 2025 18:01:54 +0000 (14:01 -0400)]
OSX build fix
Joey Hess [Mon, 17 Feb 2025 17:59:52 +0000 (13:59 -0400)]
OSX build fixes
Joey Hess [Mon, 17 Feb 2025 15:58:20 +0000 (11:58 -0400)]
Merge branch 'ospath'
datamanager [Sat, 15 Feb 2025 21:46:33 +0000 (21:46 +0000)]
Added a comment
puck [Sat, 15 Feb 2025 10:36:03 +0000 (10:36 +0000)]
Joey Hess [Fri, 14 Feb 2025 20:53:00 +0000 (16:53 -0400)]
OsPath conversion for OSXMkLibs
Joey Hess [Fri, 14 Feb 2025 20:28:43 +0000 (16:28 -0400)]
Merge branch 'master' into ospath
Joey Hess [Fri, 14 Feb 2025 19:41:23 +0000 (15:41 -0400)]
Merge branch 'master' of ssh://git-annex.branchable.com
Joey Hess [Fri, 14 Feb 2025 19:40:48 +0000 (15:40 -0400)]
further fix OSX packaging program builds
Broken by commit
e5be81f8d4bf7f6cef5ac4ff0b059efbdf6055ea
anarcat [Fri, 14 Feb 2025 17:54:24 +0000 (17:54 +0000)]
more details on my issues
anarcat [Fri, 14 Feb 2025 17:51:29 +0000 (17:51 +0000)]
Added a comment: similar topic
anarcat [Fri, 14 Feb 2025 17:47:02 +0000 (17:47 +0000)]
Added a comment: similar topic
Joey Hess [Thu, 13 Feb 2025 20:12:07 +0000 (16:12 -0400)]
draft
Joey Hess [Thu, 13 Feb 2025 17:51:21 +0000 (13:51 -0400)]
comment
Joey Hess [Thu, 13 Feb 2025 17:01:15 +0000 (13:01 -0400)]
comment
Joey Hess [Wed, 12 Feb 2025 17:27:34 +0000 (13:27 -0400)]
OsPath conversion of DistributionUpdate
Joey Hess [Wed, 12 Feb 2025 17:11:27 +0000 (13:11 -0400)]
push down OsPath into CopyFile
Joey Hess [Wed, 12 Feb 2025 16:59:30 +0000 (12:59 -0400)]
stop exporting RawFilePath
Joey Hess [Wed, 12 Feb 2025 16:43:03 +0000 (12:43 -0400)]
avoid head warnings with recent ghc versions
Joey Hess [Wed, 12 Feb 2025 16:37:36 +0000 (12:37 -0400)]
remove the git-union-merge command
This has never been built and shipped as part of git-annex,
and including it as a pedagolical example in
the source code doesn't have much benefit. The program was not currently
buildable after recent OsPath changes.
Of course, Git/UnionMerge.hs is still available and can be used.
Joey Hess [Wed, 12 Feb 2025 16:32:22 +0000 (12:32 -0400)]
fix description of ParallelBuild
Joey Hess [Tue, 11 Feb 2025 20:57:32 +0000 (16:57 -0400)]
Revert "stack.yaml: temporarily build with older ghc"
This reverts commit
2f9a384e48cb4407e6b5b70d1db6efa593654f0e.
Joey Hess [Tue, 11 Feb 2025 20:56:17 +0000 (16:56 -0400)]
Merge branch 'master' into ospath
Joey Hess [Tue, 11 Feb 2025 20:53:01 +0000 (16:53 -0400)]
fix windows and OSX packaging program builds
Broken by commit
e5be81f8d4bf7f6cef5ac4ff0b059efbdf6055ea
Joey Hess [Tue, 11 Feb 2025 20:46:01 +0000 (16:46 -0400)]
Merge branch 'ospathwin2' into ospath
Joey Hess [Wed, 12 Feb 2025 04:37:40 +0000 (20:37 -0800)]
fix convertToWindowsNativeNameSpace bug
This fixes a test suite failure. The OsPath conversion made that be used
in more places, including addurl, which exposed an existing bug.
Joey Hess [Tue, 11 Feb 2025 20:30:47 +0000 (16:30 -0400)]
avoid build warning on windows
Joey Hess [Wed, 12 Feb 2025 03:23:02 +0000 (19:23 -0800)]
OsPath transition Windows build fixes
This gets it building on Windows again, with 1 test suite failure
(addurl).
Sponsored-by: Kevin Mueller
Joey Hess [Tue, 11 Feb 2025 18:07:01 +0000 (14:07 -0400)]
fix comment
Joey Hess [Tue, 11 Feb 2025 18:05:56 +0000 (14:05 -0400)]
improved OsPath conversion
Joey Hess [Tue, 11 Feb 2025 18:03:20 +0000 (14:03 -0400)]
more OsPath conversion
this avoids 1 copy
Joey Hess [Tue, 11 Feb 2025 18:00:01 +0000 (14:00 -0400)]
more OsPath conversion
Joey Hess [Tue, 11 Feb 2025 17:54:17 +0000 (13:54 -0400)]
use to/fromOsPath
Just to reduce the number of from/toRawFilePath calls, which I would
like to minimize.
In this build path, the two are the same though.
Joey Hess [Tue, 11 Feb 2025 17:49:17 +0000 (13:49 -0400)]
remove unused functions from Utility.RawFilePath
Joey Hess [Tue, 11 Feb 2025 17:41:26 +0000 (13:41 -0400)]
replace removeLink with removeFile
same reasoning as in commit
5cc8d9d03b53f2e43d51e4f612f423178519e824
Joey Hess [Tue, 11 Feb 2025 17:01:13 +0000 (13:01 -0400)]
update todo
Joey Hess [Tue, 11 Feb 2025 16:46:14 +0000 (12:46 -0400)]
replace R.doesPathExist with doesPathExist
Equivilant, just avoids some ugliness.
Joey Hess [Tue, 11 Feb 2025 16:37:09 +0000 (12:37 -0400)]
test suite now passes after OsPath conversion
The test suite was failing because of a bug in the Database/* modules.
I had replaced doesPathExist with doesDirectoryExist, but it was
checking the database file.
I have audited commit
f1ba21d698c908ad84c08bce24fbbc376190fe83 for
other changes to doesPathExist, and checked that doesDirectoryExist and
doesFileExist were used correctly.
The only change I found is in youtubeDl', where it used to return
directories that might have been created by youtube-dl. But it was
supposed to return media files, so changing it to use doesFileExist is
actually an improvement. Although only of theoretical benefit.
Note that it would actually be possible to keep using doesPathExist,
there is a version of that for OsPath as well. But the rest of these
changes seem safe.
Sponsored-by: Nicholas Golder-Manning
Joey Hess [Tue, 11 Feb 2025 16:12:27 +0000 (12:12 -0400)]
OsPath conversion of linuxstandalone builder
Sponsored-by: Joshua Antonishen
Joey Hess [Mon, 10 Feb 2025 21:23:31 +0000 (17:23 -0400)]
Merge branch 'master' of ssh://git-annex.branchable.com
Joey Hess [Mon, 10 Feb 2025 21:22:29 +0000 (17:22 -0400)]
stack.yaml: temporarily build with older ghc
And without ospath build flag as a consequence.
This is a temporary fix to build failures on the github CI for Windows
and OSX, which use too old a version of stack to support the nightly
ghc.
I have sent a patch to those workflows, and after it is applied, this
can be reverted.
Joey Hess [Mon, 10 Feb 2025 20:25:31 +0000 (16:25 -0400)]
OsPath build flag no longer depends on filepath-bytestring
However, filepath-bytestring is still in Setup-Depends.
That's because Utility.OsPath uses it when not built with OsPath.
It would be maybe possible to make Utility.OsPath fall back to using
filepath, and eliminate that dependency too, but it would mean either
wrapping all of System.FilePath's functions, or using `type OsPath = FilePath`
Annex.Import uses ifdefs to avoid converting back to FilePath when not
on windows. On windows it's a bit slower due to that conversion.
Utility.Path.Windows.convertToWindowsNativeNamespace got a bit
slower too, but not really worth optimising I think.
Note that importing Utility.FileSystemEncoding at the same time as
System.Posix.ByteString will result in conflicting definitions for
RawFilePath. filepath-bytestring avoids that by importing RawFilePath
from System.Posix.ByteString, but that's not possible in
Utility.FileSystemEncoding, since Setup-Depends does not include unix.
This turned out not to affect any code in git-annex though.
Sponsored-by: Leon Schuermann
Joey Hess [Mon, 10 Feb 2025 19:40:04 +0000 (15:40 -0400)]
merging the two lines of OsPath conversion commits
Joey Hess [Mon, 10 Feb 2025 19:24:28 +0000 (15:24 -0400)]
OsPath conversion
While some RawFilePath and FilePath remain, this converts most of
git-annex to using OsPath.
(When built without the OsPath build flag, is falls back to using
type OsPath = RawFilePath.)
The goals are
1) improved performance by using OsPath end-to-end when possible
2) potentially avoiding memory use problems caused by pinned strict
ByteString, since OsPath uses ShortByteString
3) eventually eliminating the filepath-bytestring dependency so I don't
need to keep maintaining that library
(this doesn't get all the way, but close)
4) generally improved type safety, since OsPath is a newtype, while
FilePath and RawFilePath are just type aliaes.
This is the result of a type checker driven process. I started by
converting from System.Directory to System.Directory.OsPath, and from
System.FilePath to System.OsPath. Then I fixed all the compile errors,
which took 3 weeks of work.
Unfortunately, there are several test suite failures at this point.
Also, it only has been built on linux, on windows and OSX there are
probably ifdefs whose code still needs to be converted.
Note that there is a parallel line of commits, starting with
05bdce328d890cbac68a8627aaae262078a8290a
which is the incremental progress as I worked on this. It will be merged
with this commit. In some cases, commits in that line explain in more
details the reasons for some specific changes.
Joey Hess [Mon, 10 Feb 2025 19:18:10 +0000 (15:18 -0400)]
fix reversions
Oops, in
0b9e9cbf70c6375c8ccccdfac95b5e04ca09f891 I lost takeDirectory
in several places.
With this fixed, the test suite no longer utterly blows up, but still
fails in 7 places due to other bugs introduced in the OsPath conversion.
Sponsored-by: Graham Spencer
Joey Hess [Mon, 10 Feb 2025 18:57:25 +0000 (14:57 -0400)]
more OsPath conversion (749/749)
Builds with and without OsPath build flag.
Unfortunately, the test suite fails.
Sponsored-by: unqueued on Patreon
Joey Hess [Mon, 10 Feb 2025 16:33:21 +0000 (12:33 -0400)]
don't export pack and unpack
These are too widly used for other things to make sense to export OsPath
versions of them. And OsString also provides them and gets imported
qualified when needed.
Joey Hess [Sat, 8 Feb 2025 19:17:33 +0000 (15:17 -0400)]
more OsPath conversion (658/749)
At this point the test suite builds, and mostly the assistant is left.
Sponsored-by: unqueued
thk [Sat, 8 Feb 2025 06:59:34 +0000 (06:59 +0000)]
thk [Sat, 8 Feb 2025 06:56:32 +0000 (06:56 +0000)]
Added a comment: iroh
Joey Hess [Fri, 7 Feb 2025 21:03:31 +0000 (17:03 -0400)]
more OsPath conversion (650/749)
Sponsored-by: Nicholas Golder-Manning